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ABSTRACT 

The 40% Arecibo Legacy Fast ALFA (ALFALFA) survey catalog (a. 40) of 
~10,150 HLselected galaxies is used to analyze the clustering properties of 
gas-rich galaxies. By employing the Landy-Szalay estimator and a full covari- 
ance analysis for the two-point galaxy-galaxy correlation function, we obtain 
the real-space correlation function and model it as a power law, ^(r)=(r/ro)~'^, 
on scales < 10 h^ Mpc. As the largest sample of blindly HLselected galaxies 
'pj to date, a. 40 provides detailed understanding of the clustering of this popu- 

lation. We find 7 = 1.51 ± 0.09 and ro = 3.3 + 0.3, -0.2 h^^ Mpc, reinforcing 
the understanding that gas-rich galaxies represent the most weakly clustered 
galaxy population known; we also observe a departure from a pure power law 
shape at intermediate scales, as predicted in ACDM halo occupation distribu- 
tion models. Furthermore, we measure the bias parameter for the a. 40 galaxy 
sample and find that HI galaxies are severely antibiased on small scales, but 
only weakly antibiased on large scales. The robust measurement of the cor- 
relation function for gas-rich galaxies obtained via the a. 40 sample constrains 
models of the distribution of HI in simulated galaxies, and will be employed 
to better understand the role of gas in environmentally-dependent galaxy evo- 
lution. 

Subject headings: galaxies: distances and redshifts, clusters — radio lines: galax- 
ies — surveys — large-scale structure of universe 
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1. Introduction 



Galaxies selected by their r ieutral hydrogeri are known to be less clu stered than their 
optically-selected counterparts feasilakos et al. ( 2007 ): Meyer et al. ( 2007 ) for HIPASS) and 
less likely to be found in such dense environments. Given anticipated cosmological uses 
of 21 cm galaxy redshift surveys, it is important to understand the clustering characteris- 
tics of this population of galaxies. Specifically, 21 cm line surveys obtain detections and 
redshift concurrently, along with HI mass, reducing their expense and eliminating the 
need for follow-up observations. Such surveys are also able to probe galaxy populations 
irrespective of luminosity, stellar mass, or dust extinction. Additionally, such surveys are 



sensi tive to low-luminosity dwarf systems, which tend to be gas-dominated (|Geha et al. 



20061) . Conversely, such surveys are biased against clusters, the most luminous galaxies, 
and the 'red and dead' galaxy population. 

Given the lack of large and deep Hl-selected galaxy samples to date (the HIPASS 
main catalog and i ts northern extension contain, respectively, 4,315 and 1,002 galaxies; 
Meyer et al.r(|2004h : IWong et al.l ^006) ). this population, its evolution, and its bias com- 
pared to dark matter are poorly understood. The selection of these gala xies is strongly 
limited in redshift, and tar geted observations can only extend to z ~ 0.2 (ICatinella et al. 



20081: Ipreudling et al.ll201ll) . while the ArecibcQ Legacy Past ALPA (ALPALPA) survey is 
limited to z < 0.06. At the same time, this population is poised to become the standard 
for cosmological measurements based on observations of resolved galaxies as well as in- 
tensity mapping. Por example, galaxy redshift surveys taking advantage of the 21 cm 
transition of neutral hydrogen undertaken with instruments like the Square Kilometer 
Array (SKA) would potentiall y provide constrairits on the dark ene rgy equation of state 
and its variation with redshift (jAbdalla et al.ll20ld:lMyers et al.ll2009h . 



The differences in neutral hydrogen distribution between galaxies in clusters and 
those in the field are unevenly understood, with proposed solutions spanning from 'na- 
ture' (i.e., gas-rich galaxies form in low-concentration dark matter halos and/ or in under- 
dense environments) to 'nurture' (i.e., processes that occur after formation deplete the HI 
gas from halos, through ram-pressure stripping or galaxy interactions, or enrich HI reser- 
voirs, through cold accretion). The reality is a combination of many processes and initial 
conditions. Probing the relationship between cold gas mass and other pro perties known 



to be anticorrelated with clustering (such as spiral morphology, late type (jNorberg et al 



^The Arecibo Observatory is operated by SRI International under a cooperative agreement with the Na- 
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2002h , active star formation (|Kauffmann et al.ll2004l) , and blue colors (jZehavi et al.ll2005h ) 
may help to better articulate the influence of environment on galaxy evolution while also 
constraining the populations to which future large 21 cm line surveys will be sensitive. 

Most work directly related to the clustering of gas-ric h galaxies came out of the 
HIPASS survey iMeyer et all jlOO'A and lBasilakos et alJ (|2007h both identified the HIPASS 
Hl-selected sample as the weakest clustering population of galaxies known, but their re- 
sults regarding the mass dependence of the clustering were in conflict. While the HIPASS 
tea m found a statistically insignificant difference between 'high' and 'low' HI mass galax- 
ies, Basila kos et al.l (120071) foun d that high-mass galaxies clustered more strongly. More 
recently, Passmoor et al.l (|201lh compare the ALFALFA and HIPASS projected correlation 
function and angular correlation function, and find that they are similar but that AL- 
FALFA'S sensitivity to low-mass galaxies makes t hat s ample more strongly anti-biased 
relative to d ark matter. However, iPassmoor et al. I J201lb use o nly the ALFALFA catalogs 
pubHshed in lciovanelli et al.l (|2007l) . ISaintonge et al.l(|2'o08l) . and lKent et aP (|2008h (~ 1,800 
galaxies) despite several other ALFALFA catalogs being available at time of publication; 
these catalogs include the Virgo cluster and Pisces-Perseus f oreground void and c over 
small volumes, so do not comprise a representative sample. iPassmoor et al.l pOllI) are 
therefore severely limited in their ability to make broader claims about the population. 

The excellent sensitivity and large sample size of the a. 40 sample allows us to probe 
the clustering characteristics of Hl-selected galaxies through the two-point galaxy-galaxy 
correlation function. 

In the following sections, we describe our dataset (Section |2]) and the methodology 
used to measure the galaxy-galaxy correlation function (Section ID. We then estimate the 
real-space correlation function, both assuming a power law and by direct inversion, and 
investigate the impact of methodology choices in Section HI We compare the ALFALFA 
clustering results to those found in simulations that have, for the first time, attempted 
to assign reasonable cold HI gas masses to simulated galaxies, in Section |5l while also 
discussing the results in context, before concluding in Section [6l 



2. Dataset 
2.1. ALFALFA a AO Sample 



The ongoing ALFALFA survey is completing a census of galaxies in the local uni- 
verse, out to z ~0.06, using the seven-pixel ALFA receiver at the Arecibo Observatory 
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to detect the 21 cm line of neutral hydrogen. Compared to previous blind neutral hy- 
drogen surveys (e.g. HIPASS), ALFALFA'S enhanced sensitivity, detection centroiding, 
volume, and sample size, resulting in a cosmologically representative sample, make it 
ideally suited for an accurate measurement of the correlation function of gas-rich galax- 
ies. 

The sample used here in cludes the sky coverage of the a AO sample recently pre- 
sented in iHaynes et al.l (|201ll) , referred to as a AO because it includes the data extracted 
from coverage of 40% of ALFALFA'S skyprtnt. The statistical completeness and noise 
characteristics of the ALFALFA source catalog are well understood a nd have been dis - 
cussed extensively else where. Furthe r deta ils may be found both in ISaintongd (|2007h . 



Martin et al.l (|2010|), and iHaynes et al.l (|201ll) , which include discussions of the charac- 
teristics of the a AO sample and the sensitivity of the ALFALFA survey. In particular, 
Haynes et al.l (|201ll) discuss impacts of various volume restrictions on the derivation of 
the HI mass function. Here we summarize the salient points. 

Confidently detected sources in ALFALFA are assigned one of three object codes, 
where Code 1 refers to a reliable extragalactic detection with a high S/N (> 6.5). For the 
sample used here, we neglect the other objects. Code 2 and Code 9 sources; Code 9 sources 
are high velocity clouds (HVCs) of hydrogen gas in the vicinity of the Milky Way and are 
thus not extragalactic, whereas Code 2 extragalactic sources have lower S/N and are only 
included in the catalog because they are corroborated by a known optical source at the 
same position and redshift. Furthermore, ALFALFA'S ability to detect extragalactic signal 
near its redshift limit is degraded due to a strong source of terrestrial radio frequency 
interference, the FAA radar at the San Juan airport. We therefore include only objects 
within 15,000 km s^, which results in only a modest loss of source counts. 

alfalfa's sensitivity depends not only on the integrated flux, but also on the 21 cm 
spectrum's profile width W50 (km s^). Because the mass of an HI source is a function 
of its distance and integrated flux, integrated flux can be thought of as a proxy for mass. 
The survey therefore is not volume-, flux-, or mass-limited, and the reconstructed selec- 
tion function must take this complex sensitivity into account. Thus, when the sample is 
viewed as th e distr ibution of galaxy masses as a function of distance, as in Figure 3 of 
Haynes et al.l (|201ll) , it is clear that a. 40 is sensitive to very low HI mass galaxies nearby 
but only to significant masses at greater distances 

We refer the reader to Figure 1 of Martin et al. ( 2010l) , which displays the dependence 
of alfalfa's sensitivity on both integrated flux and profile width. The HIPASS survey 
recovered sources with the same dependence on these two parameters. ALFALFA is more 
sensitive than HIPASS, with a 5a detection limit of 0.72 Jy km s^ for a source with profile 
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wi dth 200 km in A LFALFA compared to 5.6 Jy km s^^ for the same source in HIPASS. 
In iMartin et al.l (|2010l) , we fit a linear relationship between integrated flux and profile 
width, with a break at 400 km s^, which describes the sensitivity of the survey. We use 
that same relationship and the selection function derived in that work throughout this 
paper. 

The trimmed sample includes only Code 1 objects from the a AO catalog within 15,000 
km s~^, for a total of ~10,150 galaxies used in measuring the two-point correlation func- 
tion. 



2.2. Selection Function 



The distance dependence of the selection function of a. 40 was determined using the 
2DSWML (two -dimensional stepw ise maximum likelihood) method, described fully in 
Appendix B of iMartin et aP (|2010h . 2DSWML is related to the SWML (stepwise max- 
imum likelihood) approach, but modified to account for the survey sensitivity's two- 
dimensional dependence on both integrated flux and profile width. 2DSWML splits the 
distribution of galaxy masses and profile widths in the a AO sample into logarithmic bins, 
and then calculates the best-fit HI mass function (analogous to a luminosity function) 
which maximizes the joint likelihood of detecting all galaxies in the sample. This ap- 
proach simultaneously measures the selection function for each detected galaxy in the 
sample. In this work, we will use that selection function S{Di) for each galaxy i with a 
known distance Di in Mpc. 

For application to the correlation function, is calculated for every galaxy in the 
sample and then Sd is smoothed. This smoothed selection function can additionally be 
combined with an HI mass function to make predictions regarding the number of galax- 
ies of a given mass that are expected to be found in the survey, or used to predict the 
redshift distribution of the survey under an assumption of homogeneity. Figure [H pre- 
viously published in iHaynes et al.l (|201ll) , shows a histogram of the a AO redshift distri- 
bution, with peaks and dips representing clusters and voids, respectively, along with an 
overplotted prediction based on the selection function and a non-clustered Universe. 

Disagreements between the prediction and the observations are due both to the exis- 
tence of large scale structure in the survey volume and to the loss of survey sensitivity at 
certain velocities due to radio frequency interference. This contamination is quantified as 
a percentage of survey coverage at a given heliocentric velocity, or spectral weighting, as 
discussed in IMartin et al.l (|2010( ) and earlier publications from the ALFALFA survey. For 
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Fig. 1. — The observed redshift distribution of a. 40 galaxies (histogram) compared to the 
expected distribution (solid line) obtaine d via the survey's se lection function. This Figure 
was previously published as Figure 21 in lHaynes et al.l (|201l[ ). 



the purposes described here, the weights map has been translated into the CMB reference 
frame in order to most accurately model the predicted ALFALFA galaxy distribution (see 
Section |3^ . The selection function is used both for the creation of the random samples 
for estimation of the correlation function, and for the weighting of pair counts in that 
estimate. 



3. Method: Estimation of ^ (r) and Error Analysis 



We measure the correlation function, ^(cr, tt) in bins of on-sky (a) and radial (vr) 
redshift-space separations, using their observed velocities. Given the redshift extent of 
a AO, we have translated measured galax y velocities from the he liocentric frame of refer- 
ence to the CMB frame of reference using iLineweaver et al. (|l996h . For two galaxies i and 
j, these separations are: 



Vi + 



a 



Hn 



X tan (9/2) 



(1) 



and 



71 



\Vi - Vj\ 
Hn 



(2) 



where 6 is the angular separation of the two galaxies on the sky, Vi and Vj are defined 
in the CMB reference frame, and Hq is expressed in units of h (Hq = lOOh). We adopt 
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the iDavis & PeeblesI i 
(|l994a|), but note that 



983h definitions fo r a and vr rather than those used in iFisher et al 



Guzzo et alj (|l997n found negligible differences for a sample with 



similar redshift extent and covering a portion of a AO's survey volume. Because we are 
using a sample of galaxies in the very local universe, we neglect cosmological corrections 
to the distances. This choice is further supported by our focus on small relative pair 
distances (always less than 30 h"^ Mpc) and our interest in projected quantities where 
such distance errors, already very small in magnitude, are absorbed in the projection. 

Our ultimate goal is to measure ^ (r), the real space correlation function, through the 
observables actually available to us, that is, ^(cr, vr). In particular, we are interested in 
modeling the power-law shape of the correlation function up to ~10 h~^ Mpc, beyond 
which point the correlation function is known to diverge from a simple power-law. 

Since ^ measures not simply the probability distribution of galaxy separations in a 
sample, but the excess probability compared to a homogeneously distributed sample, es- 
timators compare the observed galaxy distribution to a random distribution designed to 
reflect the survey's observational limitations but to exclude the effects of large scale struc- 
ture. This is straightforwardly accomplished by comparing the number of pairs in (cr, 
tt) separation bins from the observed sample to the pair counts from the random sam- 
ple. In the sections that follow, we will describe this method and the corresponding error 
analysis in greater detail. 



3.1. Pairwise Estimation 



We adopt the Landy-Szalay pairwise estimator ( Landy & Szalayiri993 ) for the correla- 
tion function. The Landy-Szalay normalization of data-data (DD), random-random (RR), 
and data-random (DR) pair counts allows us to construct a random catalog that contains 
many more objects than the observed data catalog, thereby reducing the introduction of 
shot noise from the random set. The Landy-Szalay estimator is constructed from these 
normalized counts: 



D 



DD 



2{D 



DR) 



D 



RR 



D 



RR 



(3) 



Because a. 40 is not volume-limited, the pair counts must be weighted so that the 
me asurement is not d omi nated by galaxies at th e peak of the selection function. Follow- 
ing iMeyeretalJ (|2007l) and lHawkins et al.l (|2003[ ), we apply a weighting Wij = Wi x wj for 
the contribution of each pair i, j to the Landy-Szalay estimator, given by: 
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Wi 



1.0 

1.0 + 4nNDS{r,)Ms) 



(4) 



where is the number of galaxies in a AO, S{r,i) is the selection function measured 

for a AO at r, = czcMB,i/ Hq, and 



Us)= f s'H{s)ds' 
Jo 



(5) 



defined in terms of the redshift space coordinate s = vo^ + vr^. 

This expression for J3 requires an assumed model for ^(s), but the final measurement 
of the correlation function is not sensitive to this assumed input for object weighting; we 
assume a power-law form: 



as) = - 

So 



(6) 



and we test our robustness by first assuming a fiducial value found for optically- 
selected samples, sq = 5.0 h~^ Mpc and 7 = 1.8 and, after that, iterating to the value sq and 7 
measured for a AO. No statistically significant difference is observed through this iterative 
process, and we therefore proceed as other a uthors have, using th e fiducial optical values 
reported here in our J3 weighting. Following iFisher et al.l (|l994a|]bl) , we apply an artificial 
cutoff with a maximum value of s = 30 h^ Mpc in the expression for J3. 



3.2. Random Samples 

We construct random samples that contain 20 times the number of objects in the 
a AO dataset. These random samples are carefully designed to include survey selection 
effects while excluding correlations due to large-scale structure. This is accomplished by 
predicting the distribution of czcm b from the survey selection and HI mass functions (see 
Figure [T]) and then folding in the loss of volume as a function of v elocity due to radi o 



frequency interference, measured from the spectral weights map in iMartin et al.l (|2010l ). 
Objects in the random set are randomly assigned a sky position within the right ascension 
and declination boundaries of a AO and are then assigned a redshift from this predicted 
distribution. The resulting redshift distribution for one example instance of the random 
sample procedure is shown in Figure |2l 
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Fig. 2. — The redshift distribution of the constructed random sample. The dips in the 
distribution at ~ 8,000 km s^ are due to radio frequency interference at the Arecibo 
Observatory. When data at these frequencies is flagged as bad (and thus ignored in the 
processing pipeline), it leads to a reduction in the effective search volume at the corre- 
spo nding velocities, w hich translates into a reduction in counts in the random samples. 
See iMartin et al.l (|2010() for a plot of the average relative weight as a function of velocity 
in a.40. 



3.3. Error Analysis 

The correlation function is measured in bins of separation. While the correlation 
function is expressed as a function of several different coordinates while iterating towards 
the real-space correlation function ^ (r), the bin counts and thus the measured correlation 
functions are correlated with one another in every such coordinate system. Because struc- 
tures, such as clusters, will contribute an overabundance of pairs to a set of several bins, 
the measurement in each bin is not independent of the others. In plots of the correlation 
function shown here, we display the on-diagonal elements of the covariance matrix (i.e. 
the standard deviations) as uncertainties on each point. However, in order to work with 
our measurement to estimate the power-law shape of the correlation function of gas-rich 
galaxies, we must construct a full covariance matrix and take off-diagonal elements into 
account. 

To construct the covariance matrix C, we carry out our pair-counting routine on more 
than 500 bootstrap resamplings of the data, and a single catalog of random objects is 
reused in each case. Each of the bootstrap measurements of ^{a, n) contains galax- 
ies selected at random from a AO, with replacement. From this set of realizations, we 
construct the covariance matrices for ,^(s), E{a) and ,^(r). The covariance between two 
correlation function bins bi and bm is given by: 
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-^realizations /? Z,\ /L \ 

C{l,m)= f2 '"^ (7) 

. -, ^realizations 1 

1=1 

The significant off-diagonal elements of the covariance matrix make it difficult to ob- 
tain a power-law model fit by minimizing the values weighted by the variance. The 
covariance matrix, however, is not an inescapable quality of the data, but is actually de- 
pendent on the basis in which the data are projected. In this case, we have some number 
of bins Nf) representing a set of variables b (bin centers in h^ Mpc), and can choose to 
work in an orthonormal basis with coordinate axes in which the covariance matrix 
C is diagonalized. This basis is defined by the principal component eigenvectors of the 
measurement, and we borrow elements of principal component analysis in order to ob- 
tain model parameter fits and uncertainty estimates. 

The principal components are linear combinations of the original Nf, variables ar- 
ranged such that the first principal component corresponds to an orthonormal axis through 
Nfc-dimensional space that explains the largest proportion of variance in the dataset. 
These principal component vectors are defined by the eigenvectors of the covariance ma- 
trix of the original dataset. 

Following ^ Fisher et al. ( 1994a ), we calculate the principal eigenvectors and construct 



a diagonalizing matrix, R, the columns of which are these eigenvectors, and a new co- 
variance matrix, C, projected in the new basis set. Since all of the covariance has been 
accounted for in the definition of the principal components, C has no off-diagonal ele- 
ments, and the variance is captured in the on-diagonal elements a. 

Given C and R, a set of models with varying values for sq and 7 can be projected into 
the principal component basis via bmodei = b^odeh for comparison to the measured b. 
We find the value of each parameter that minimizes the expression 



X 



1 sr^b, -b, 



Lmodel 



(8) 



Finally, we construct error ellipses to full y describe the like ly parameter space of the 
power-law model for the correlation function ( Press et al. 19921) . 
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3.4. Obtaining the Real-Space Correlation Function 

From the three-dimensional galaxy coordinates available to us, we can construct 
^{a, n). This calculation of ^{a, n) is the fundamental measurement upon which the re- 
sults presented in the rest of this work are based. The resulting image, shown in Figure 
|3]with contours overplotted, clearly reveals the redshift space distortions that lead to the 
difficulty in estimating the real-space correlation function. The radial coordinate, vr, ap- 
pears weakly stretched at small angular separation a because of the Eddington effect in 
clusters, though because Hl-selected galaxies are known to avoid dense cluster regions, 
this effect is less prominent for a AO than for optically-selected samples. In the other di- 
mension, vr is flattened on large scales because of the coherent motion of galaxies towards 
attractors. This 'squeezing' effect is determined by the clustering bias (with respect to 
dark matter) of the sample and by the underlying matter density fluctuations. In a future 
work, we will explore the shape of ^ (cr, vr) to model the matter density field. 

20 



10 

(J 

CL 



-10 



-20 

-20 -10 10 20 

a (h"' Mpc) 

Fig. 3. — The two-dimensional correlation function ^(a, vr) from a AO, measured in h~^ 
Mpc; brighter colors indicate stronger clustering. 

In order to obtain the real-space correlation function ^ (r), we must take the interme- 
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diate step of projecting $,{a, tt) along the tt axis (in practice, using the discrete bins of size 
Att), resulting in what is known as the 'projected correlation function' and symbolized as 

E{a)/{a): 



^ ^ 



Following previus work, the maximum value of the integration, TCmax, is selected so 
that the summation is convergent but is kept as low as possible to avoid the introduc- 
tion of noise from poorly-measured intermediate scales. From the original estimation of 
^ ((J, vr), which counted pairs up to distances of ~60 h"^ Mpc, we carry the sum along the tt 
axis up to a scale of TCmax 29.7 h^ Mpc. Further, we confirm that the resulting correlation 
function is not strongly sensitive to the chosen value of rcmax, but extending the integra- 
tion to scales that are too large for the sample to sufficiently measure introduces scatter 
and noise into the correlation function estimate. 

/ (a) is closely related to the function in which we are truly interested, ,^(r) where 
r is the real-space distance, via: 

fa) 2 r , rdr 



In order to evaluate the real space correlation function, some assumptions must be 
made about its form. Two options are usually explored in the literature: a power-law 
form, or a stepwise-function form which makes no assumptions about shape but does 
assume that the binning used well-represents an underlying smooth correlation function 
(i.e., the 'direct inversion' method). If we assume a power law of the form ,^(r) = (r/ ro)"''^, 
we find: 



_ ro r(l/2)r((7-l)/2) 



a a 



r(7/2) 



(11) 



In Equation [m the function T is the well-known Gamma function. Equation [TT] can 
be recast in terms of fitting parameters: 

=M "■"■^''Ai;) (12) 



a \a 



Following iMeyer et al.l (|200/1) , we rearrange Equation [121, obtain the best-fit power- 



law of the form ^ (cr) =ai a"-'^ using the minimization given by Equation [8|, and then 
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relate those parameters to ro and 7 which represent the best-fit power law for ,^(r). 

In the next section, we derive and discuss ^(r) using the mechanisms described in 
this section. 



4. Results: Clustering in a. 40 

4.1. S(cr)/(T and ^(r) Assuming Power-Law Model 

The projected correlation function S((T)/a (recast for the figure and the fitting as S((t)) 
is displayed in Figure HI, along with error bars reflecting the on-diagonal elements of the 
full covariance matrix. The dashed line is the best-fit model obtained by minimization 
using the full covariance matrix. In Table [H, we list the parameters for the fit and their un- 
certainties, along with the fits obtained if only the the on-diagonal elements (the standard 
deviations, a) are used to carry out the standard least- squares fit. For corn parison, we also 
include the clustering reported by the HIPASS team (jMeyer et al.l (|2007n : note that those 
au thors ignored the off- diagonal elements in their error analysis), the clustering found 
by iBasilakos et al.l (|2007l) using the same HIPASS dataset, and the clustering of several 
optically-selected samples of interest. We also display the iPassmoor et al.| (|201lh results, 
which used a small, publicly available early subset of the ALFALFA data. 

Error ellipses are displayed in Figure |5l, with the dashed contour giving the Icr single- 
parameter uncertainties listed in Table [H 

While both the full covariance analysis and the assumption of bin independence give 
similar results, the larger uncertainties on the full covariance analysis give an indication of 
the need to be conserva tive. Parameter uncertainties previously reported in the literature 
(i.e., IMeyer et al.l (|2007n ) significantly underestimate their reported statistical uncertain- 
ties. Even with the greater sensitivity, larger sample size, and deeper redshift range of 
a AO, the correlation function analysis allows for quite a large range of clustering scenar- 
ios. 



We confirm the HIPASS result that Hl-selected galaxies are among the most weakly 
clustered known class of galaxies, most comparable to, but still less clustered than, the 
faint late-type subsampltng in 2dFGRS and the IRAS galaxy redshift survey, which was 
also biased toward starformtng galaxies. Similar to the results w e will present in Se ction 
6 for a AO, the flux-limited sample of IRAS galaxies considered in lFisher et al.l (|l994al) was 
found to be antibiased relative to cold dark matter on small scales but unbiased on inter- 
mediate scales (~10 h^ Mpc) and positively biased on the largest scales (beyond 10 h~^ 
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Table 1. Best-Fit Correlation Function Power Law Models 



Fitting Method 


ro (h^^ Mpc) 


7 


Full Covariance 


3.3 (+0.3, -0.2) 


1.51 (± .09) 


On-Diagonal Only 


3.2 (± 0.1) 


1.48 (± .03) 


Passmoor ALFALFA 


2.3 (± 0.6) 


1.6 (± .1) 


HIPASSa (2007) ^ 


3.5 (± 0.3) 


1.47 (± .08) 


HIPASSb 


3.3 (±0.3) 


1.4 (± 0.2) 


2dFGRS late-type faint<^ 


3.7 (±0.8) 


1.8 (± 0.1) 


SDSS Bright'' 


6.2 (±0.2) 


1.85 (± 0.03) 


SDSS Faint^ 


3.5 (±0.3) 


1.92 (± 0.05) 


IRAS All-Sky (real space)^ 


3.76 (± 0.20) 


1.66 (± 0.10) 


QDOTS 


3.87 (± 0.32) 


1.11 (± 0.09) 


Pisces-Perseus early types'^ 


8.35 (± 0.75) 


2.05 (± 0.10) 


Pisces-Perseus late types^ 


5.55 (± 0.45) 


1.73 (± 0.08) 



Passmoor et al.l (|201lh 



Weyer et al.l (|2007h 
Basilakos et al. ( 2007h 



'^We include th e second-faintest sample due to a warning 



m 



Norberg et al. I [2002 ) that the faintest (and smallest) 



sam- 



ple provides poorly-constrained fits. 
^ Zehavi et al. i2005h 



Fisher et al.l (|l994ah 



g Moore et al. I il994h 
GuzzoetaL (|l997h 
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Fig. 4. — The projected correlation function S((t) from a. 40. Error bars reflect the on- 
diagonal elements of the full covariance matrix. The overplotted dashed line is the fit 
from the full covariance analysis, with 7 = 1.51 ± .09; ro = 3.3lo 2 (h^^ Mpc). 



Mpc). Similarly, the QDOT sample (jMoore et al.lll994j) , also taken from the IRAS parent 
catalog but based on a lower flux limit and employing a different sampling strategy, was 
found to be unbiased with respect to dark matter. 



Guzzo et al.l (|l997l) provides an interesting comparison to our findings, as it was also 
based on an analysis of 21 cm galaxy profiles observed with the Arecibo Observatory, 
although that work was based on an optically-selected, magnitude-limited sample rather 
than a blindly Hl-selected s ample as in this ca se. This work also samples a region that 
partially overlaps with a. 40. IGuzzo et al.l (|l997n split their sample by morphological type 
and determined the variation of clustering strength between early- and late-type (spiral 
and irregular) galaxies in Zwicky's catalog within the Pisces-Perseus region. They found 
that the early types were significantly more clustered than the late types, as reflected more 
generally in Table [H, but their volume-limited sample is significantly more clustered than 
the Hl-selected ALFALFA sample. 

Our findings for the clustering of HLselected galaxies are in agreement with pre- 
vious results, particularly with our understanding that ALFALFA galaxies tend to be 
blue, spiral, and late type galaxies which are already know n to be weakly clustered 
([Norberg et al.ll2002l:lKauffmarm et al.ll2004j: IZehavi et al.ll2005h . Apart from the estimates 



3.0 3.2 3.4 3.6 3.8 
Tg, Mpc 



Fig. 5. — contours for 7 and ro (h~^ Mpc). The dashed contour gives the Icr projected 
uncertainties on 7 and ro as single free parameters, and the solid contours give joint 1, 2 
and 3 a fits, respectively, to the pair of two free parameters. 



of ur icertainties, the cluste ring of a. 40 is in agreement with the HIPASS findings but not 
with lPassmoor et al.l (|201lh . which is not unexpected giving the weaknesses - in particular 
extremely small sample size - of the latter. 



4.2. (^(r) via the Inversion Method and the Shape of the Correlation Function 



(^(r) will only be tidily related to E{a)/a if we assume that the underlying physics 
of galaxy formation dictates a power-law form for ^(r). In the previous section, we cal- 
culated the correlation function of gas-rich galaxies under that assumption, but it is also 
possible to avoid that assumption and obtain ^ (r) by direct inversion of the projected cor- 
relation function E. The inversion method tests the power-law assumption, though it is a 
noisy measurement and results in large scatter. As an independent test of the shape of the 
correlation function, ^ (r) determined via this inversion method will be especially useful 
at scales above ~2.5 h^^ Mpc, where the correlation function shows features inconsistent 
with the power law assumption. 



Following iMeyer et al.l (|2007h . iHawkins et al.l (|2003h and ISaunders et al.l (|l992h , we 



take our measurement of S(cr) to represent an underlying step function form with values 
El in intervals with centers a/, rearrange Equation [TOl, and interpolate between bins to 
give: 



1 



X In 




(13) 



vr 



The sum in Equation [13] is truncated so that amax = T^max for the value of T^max used in 
Equation m 

The projected correlation function and ^ (r) obtained by the inversion method are in 
excellent agreement, as shown in Figure [6l where the points are the inversion ^(r) with on- 
diagonal uncertainties and the overplotted dashed line is the best-fit power law model for 
the projected correlation function. It is also clear that the inversion method, as remarked 
earlier, is more vulnerable to variance. Its use is motivated as a check on the assumed 
shape of the correlation function. Furthermore, because the power law assumption is 
known to be useful only on small-to-intermediate scales, the correlation function obtained 
via the inversion method allows us to extend to large scales, ~ 30 h~^ Mpc 




0.01 



0.10 




0.1 



1 .0 



10.0 



r, h 1 Mpc 



Fig. 6. — ^(r) obtained via the inversion method, extended to scales ~ 30 h~^ Mpc, with the 
best-fit power law model for the projected correlation function overplotted as a dashed 
line to demonstrate agreement. 
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By relaxing the power-law assumption, we can further examine divergence from a 
power-law shape, a well-known phenomenon found in clustering studies of other popu- 
lations of galaxies. The ALFALFA correlation function shows a 'shoulder' a t scales of ^ a 



few h ^ Mpc, as observed by, e.g. jGuzzo et al.l (|l99ll) : lHawkins et al.l (|2003h : IZehavi et al 



(|2004j) . Under the assumption of an inflati onary, cold dark rn atter universe and using a 
halo occupation distribution (HOD) model, IZehavi et al.l (|2004l) infer that the well-known 
shoulder is due to two distinct regimes in which galaxy-galaxy pairs are counted. On 
large scales, pairs are counted from separate dark matter halos, while on small scales, 
pairs are counted in the same dark matter halo and are subject to nonlinearity. 

Because the galaxies probed by ALFALFA are gas-rich, and because an Hl-selected 
sample is biased towards gas-dominated low-mass objects that would be classified as LSB 
(low surface brightness) dwarfs in an optical survey, we expect that the characteristics of 
the single-halo regime would differ from that observed in the case of an optically-selected 
sample. For example, tidal interactions and stripping within dense halos, which would 
decrease the pair co unts of Hl-selected galaxies, would change the relative contributions 
of the two regimes. IWatson et al.l (|201ll) find that a resulting power-law correlation func- 
tion, when the contributions from both the one-halo and two-halo regimes are included, 
is only found under conditions in a narrow mass and redshift range for the general pop- 
ulation of galaxies. Given a different halo occupation distribution model as a function of 
galaxy properties, the shoulder or break from the power law would be more prominent. 
The ALFALFA correlation function therefore provides yet another approach from which 
we can better understand the evolution of HI and the distribution of gas-rich galaxies in 
the present universe. In a future paper, as the ALFALFA dataset continues to grow, we 
will present a HOD analysis of the ALFALFA correlation function as an extension of the 
present work. 



4.3. Systematics and Methodology 

Our estimation of ,^(cr, vr) included two choices upon which our results could be de- 
pendent. We selected both the logarithmic binning intervals for pair counting as well as 
the value Timax for projtecting ^(a, vr) into S((t). Investigations of alternative schemes sug- 
gest that our results are not strongly dependent on either the binning or the choice of Umaxr 
though TTmax IS Specifically selected to lead to a stable solution for ^ without introducing 
the noise and scatter from scales that are poorly probed by the a. 40 sample. 

We consider whether extreme redshift distortions nearby, for small values of cxcmb, 
could be contaminating our results. To test this possibility, we repeat the measurement. 
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this time excluding galaxies within czcmb < 2,000 km s^^, as well as 3,000 km s^^. This 
reduced the sample size to ~ 9,300 and ~ 8,900, respectively, but we found no difference 
in the final fitting parameters ro and 7. We conclude that there is no advantage to be 
gained in eliminating nearby galaxies from a AO for the correlation function analysis. 

The expression for J3 in Equation |5] requires an expression for the shape of the corre- 
lation function. In calculating ^(cr, vr)) there is therefore a presumably small dependence 
on the as-yet unknown parameters sq and 7. As briefly mentioned in Section |3H one pos- 
sible way to avoid any potential problems is to use the fiducial optical sample parameters 
so=5.0 and 7=-!. 8 to calculate the parameters for an Hl-selected sample, and then iterate 
towards a stable solution. In attempts to do this, we find that there is no significant dif- 
ference between the parameters estimated via these two methods, and confirm that ^ is 
not dependent on the precise form of J3. Such iteration does not provide any advantage. 
We demonstrate this in Figure |Zl which displays the error contours on the power-law fit 
parameters for a sample limited to czcmb > 2,000 km s^, and for J3 using the approxi- 
mate parameters estimated in Section 14.11 The results are very close to those in Figure |5] 
and the la estimated parameters are identical. 




Fig. 7. — contours for 7 and ro (h"^ Mpc) excluding all galaxies with czcmb < 2,000 
km s^^. The parameters used in J3 are approximations of the Hl-selected ro (3.4 h"^ Mpc) 
and 7 (-1.5). The dashed contour gives the Icr projected uncertainties on each parameter, 
and the solid contours give 1, 2 and 3 a fits, respectively, to the pair of parameters. 



5. Discussion 



5.1. Comparison with Mock Catalogs 



The correlation function of gas-rich galaxies has implications for the improvement 
of galaxy simulations, by providing an observational constraint for the results of simula- 
tions. This work will allow a better match between simulations and the observed relation- 
ship between gas mass and clustering properties. Simulations are just now progressing 
to the point where reasonable, realistic cold HI gas masses can be assigned to galaxies. In 
this section, we will compare the results of the correlation analysis of a. 40 with presently 
available cold dark matter simulations. 

We are limited i n our ability to com pare our observation s to simulations by what 
is available publicly. iMartin et al.l (|201Cl|) took advantage of the lObreschkow et al. I J2009h 

1 hereafter O09) simulation, which assigned cold gas to galaxies from the lDe Lucia & Blaizot 
2007^) catalog of Millennium Simulation galaxies. In that work, we found that O09's sim- 
ulation provided a reasonable fit to the observed HI mass function. However, this catalog 
may not be adequate for comparison with an observed correlation function. In particular, 
O09 caution that the mass resolution of the simulation prevents them from applying their 
findings to faint, low surface brightness, or low-mass galaxies. Given the known correla- 
tions between galaxy type and clustering, and between HI mass and luminosity, it would 
be difficult to use this catalog to explore the relationship between current simulations and 
alfalfa's observations. Furthermore, O09 did not themselves carry out this analysis. 



Kim et al.l (|201ll) have provided another option for comparison, using a set of four 
GALFORM semi-analytical models that treats a range of processes which influence gas 
reservoirs, including cooling, ram-pressure stripping, mergers, star formation, and super- 
nova feedback. They report results over a range of redshifts, but for this work only their 
results at z = are of interest. They find that the galaxy-galaxy correlation function of the 
simulations is consistent with those found for HIPASS, and confirm that their simulation 
shows gas-rich galaxies as being significantly less clustered than dark matter. Differences 
between the models and the scales at which those differences are important can be used 
to highlight potential problems in the assumptions, such as models that overpredict the 
gas richness of satellites. 



In Figure [H, we compare the models presented in lKim et al.l (|201ll) with t he observed 



a. 4:0 c orrelation function for Hl-selected galaxies. The models include that of iBower et al 



(|2006|) , labeled as Bow06; a modified version of the same, labeled MHIBow06; a version 
that uses a slightly different background cosmology that is in bette r agreement with the 
WMAP parameters, labeled GpcBow06; and, finally, the model of iFont et al.l (|2008j), la- 
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beled as FontOS. In all models, only galaxies with Mcoid > 10^-^ h"^ M©, where Mhi = 0.76 
Mco/d/ (1+-04), are included, which matches the HIPASS galaxy selection but may be more 
massive thaii would be ideal for mat ching a. 40, which probes very small g as masses 
( Martin et al. 20 id: Haynes et al. 2011). The models are described in detail in Kimetal 



( 20111) , and here we only discuss the main differences that may be relevant for a compar- 
ison to a AO. 
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Fig. 8. — Models for ^(a) from iKim et al.l (|201l|) , compared with a. 40 (filled points with 
error bars). 



Bow06, MHIBow06, and FontOS all use the Millennium Simulation to track galaxies 
and halos, while GpcBow06 uses a different method involving merger trees and a large 
box size. Bow06 and FontOS are able to match optical luminosity functions, but both over- 
predict the abundance of HI in low-mass galaxies. MHIBow06 was created by adjusting 
the star formation timescale in Bow06, thereby fixing this excess while maintaining the 
agreement with optical properties of galaxies. GpcBow06, finally, also has a modified star 
formation prescription which better fits the HI mass function compared to Bow06. 

In Figure HI, it is clear that MHIBow06 and Bow06 fit the observed HI correlation 
function on small scales, while FontOS drastically overpredicts and GpcBow06 drastically 
underpredicts the strength of clustering for gas-rich galaxies on those scales. At large 
scales, both GpcBow06 and Bow06 underpredict the clustering strength, while FontOS 
and MHIBow06 follow it closely. Although not a perfect match, the MHIBow06 model 
appears to be most consistent with the clustering of gas-rich galaxies over the full range 
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of accessible scales. 

Part of these differences may be due to the mass resolution of the models, given that 
a. 40 probes to significantly lower masses than the HIPASS survey for which these models 
were designed. What is clear, however, is that a AO already provides constraints that can 
begin to differentiate between successful and unsuccessful models, and the full ALFALFA 
sample should be able to provide very robust constraints for testing simulations that take 
HI into account. 



5.2. The Bias Parameter for Hl-Selected Galaxies 

The bias between any two classes of objects indicates their relative clustering strength. 
For cosmological purposes, we are interested in comparing the clustering of types of 
galaxies with the underlying dark matter halo distribution, in order to understand how 
well future surveys would probe the true (baryonic + dark) mass distribution. The com- 
parison is achieved through the linear bias parameter at z = 6o- 



^gai{r) = hi iDM{r) 



(14) 



In general, = bo{r), based on the linear theory prediction that bo is independent of 
scale. In real galaxies, however, this is expected to become true only above intermediate 
scales of ^10 h~^ Mpc. If bp > 1 - 0/ as is the case for re d galaxies which tend to be found in 
clusters (e.g. IGuzzo et aP (ll997h : lNorberg et al.l ( |2002l ): IZehavi et al.l (|2005|) : lLi et al.l (|2006l) : 
Swanson et al.l (|2008l) ), then the distribution is positively biased with respect to the dark 
matter. For galaxies like Hl-selected populations, bo < 1.0 and they are said to be antibi- 
ased. 

As a proxy for the underlying dark matter distribution at z = 0, we use the correlation 



function of dark matter halos from the Millennium Simulation, given in ISpringel et al. 



(|2005|) as a function over the same scales that we are interested in. The Millennium Simu- 
lation, however, used an early WMAP estimated value for the parameter erg of ~ 0.9 that 
is generally recognized to have been high. We have adjusted our calculatio ns to use the 



recommended value as = 0.8 from the Seven- Year WMAP (WMAP7) results (|Larson et al, 
201lh . 



In Figure |9] we compare that correlation function to the a. 40 observation of ,^(r) for 
Hl-selected galaxies using the inversion method. The dark matter correlation function 
deviates strongly from a power law on small scales, an indication of the well-known 
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fact that bias is scale-dependent. Dark matter is, as expected, significantly more strongly 
clustered than this particular population of galaxies, but the bias becomes less significant 
at intermediate and large scales. 




0.1 1.0 10.0 

r (h~' Mpc) 



Fig. 9. — The real-space correlation function ,^(r) for dark matter from the Millennium Sim- 
ulation (solid line, and adjusted for the WMAP7 value of ag) and for Hl-selected galaxies 
from a AO (direct inversion method points with error bars, along with the best-fit power 
law plotted as dashed line; model fit given in Tabled]). 



In Figure [TOl we display the bias parameter as a function of scale. The error bars are 
based only on the a. 40 uncertainties and they assume that there is no uncertainty in the 
Millennium Simulation's measurement of the correlation function. This Figure reflects 
what we already understand about the clustering properties of HI selected galaxies: on 
small scales, the clustering of gas-rich galaxies is weaker, and on ever-larger scales the 
distributi on of gas-rich g alaxies begins to more closely reflect the underlying matter dis- 



tribution. iBasilakos et al.l (|2007.) measured the linear bias parameter on large scales for the 
HIPASS sample using a different technique. Exploring modeled dark matter power spec- 
tra for different values of bo, including bias and assuming a concordance cosmology, they 
identified the most likely bias parameter. They found bo = 0.7 ± 0.1, in general agreement 
with the findings for a AO thoug h we have measured t he bias parameter as a function 
of scale. The preliminary work of iPassmoor et al.l (|201lf) is generally consistent with this 
result, though that work does not capture our finding that the sample becomes unbiased 
on large scales. 
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Fig. 10. — The bias parameter bo(r) as a function of scale. 



Because of the very limited redshift extent of ALFALFA and the a AO sample, our 
work cannot comment on the evolution of clustering or bias for Hl-selected objects. How- 
ever, as a robust measurement of these properties at z = 0, we do provide a benchmark 
observational constraint with which theoretical models will need to agr ee. The earliest 



work comparing the bias of HI at z = to the anticipated evolution, by iBasilakos et al. 
IzoozD, deter mined bn ^ 0.68 t oday and predicted would range from ~ 2 — 4 by z = 



4. Recently, iMarm et al. have used a different, simple bias model, incorporating 

observational constraints, that relates Mhi to Mdm to estimate HI masses of Millennium 
Simulation halos and then investigated the bias of HI with respect to the halo distribu- 
tion. At z=0, they estimate that the overall linear bias parameter on large scales is ~0.8, 
slightly higher than our findings at those scales. Their Figure 6 is more comparable to our 
Figure [lOl and shows the same overall rise of the bias with increasing scale found here. 
Their models also predict that the bias will rise sharply with redshift, with the linear bias 
parameter reaching b4 ~ 2 by z ~ 4. 



The a AO observations and lMarm et al.l (|2010l) predictions have implications for large- 
scale 21 cm galaxy surveys and intensity mapping projects with such instruments as the 
Square Kilometer Array (SKA). If the theoretical results reflect the true evolution of HI 
gas in the universe, then these projects can expect strong 21 cm signals at a range of 
redshifts. Perhaps more importantly, the a. 40 observation of the correlation function at 
low redshift provides a robust baseline constraint for the development of SKA model 
predictions, and future simulations will need to match both the HI mass function and the 
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correlation function at z ~ 0. 

Because the Hl-selected galaxy bias is likely to be strongly dependent on HI mass, 
with low-mass objects severely antibiased with respect to the underlying dark matter 
distribution, high-redshift surveys which are sensitive only to the high-mass end of the 
HIMF should expect to be mildly antibiased at low redshifts and increasingly positively 
biased at intermediate to high redshifts. We will explore the mass, color and luminosity 
dependence of the correlation function as the focus of a future work. 



6. Summary and Conclusions 

We have used the ~10,150 galaxy a. 40 sample to measure the correlation function 
of Hl-selected galaxies in the local universe. We use bootstrap resampling and a full 
covariance analysis in order to model the real-space correlation function on scales < 10 
h~^ Mpc as a power law, ^(r)=(r/ ro)~^. We find that 7 = 1.51 ± 0.09 and that the clustering 
scale length is Vq = 3.3 + 0.3, -0.2 h^^ Mpc. Furthermore, we show using a direct inversion 
method that the observed a AO real-space correlation function closely follows this power 
law. The direct inversion method also allows exploration of the divergence from a single 
power law, seen as a 'shoulder' in the correlation function at scales of ~ a few h~^ Mpc. 
Our findings are shown to be robust against the precise form of the weighting used in 
the pairwise estimation of ^(cr, tt) and the a. 40 sample selection criteria. The superior 
sensitivity of ALFALFA, and high selection function, allows us the include the full survey 
redshift range (cz = to 15,000 km s~^) without the introduction of significant noise in the 
analysis. 

The clustering of Hl-selected galaxies is significantly weaker than the clustering of 
general populations of optically-selected galaxies, and is most closely comparable to sam- 
ples of faint, late-type, blue and/ or starforming galaxies found in optical (and infrared) 
surveys. Available models of HI in simulated galaxies are in general agreement with our 
observations, and the a. 40 measurement of the correlation function is robust enough to 
begin constraining these models. 

Finally, we measure the bias parameter for a AO, using the correlation function of dark 
matter haloes from the Millennium Simulation, and find that the small-scale clustering of 
HI galaxies is severely antibiased with respect to the underlying dark matter distribu- 
tion. On large scales the antibiasing becomes only moderate. We suggest that isolating 
the high-mass galaxies in a. 40 will show that this population more closely follows the 
true mass distribution and that an abundance of low-mass galaxies in underdense voids 
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partially explains the strong antibiasing observed. 

The a AO sample provides, for the first time, a robust measurement of the clustering 
of Hl-selected galaxies, which can be used to provide observational constraints for the- 
oretical models. While gas-rich galaxies are, currently, poorly modeled in N-body and 
semi-analytic simulations of the Universe at z = 0, this situation is likely to change given 
the results presented here and, especially, the full results when the ALFALFA catalog is 
complete with a sample of ~ 30,000 objects. 



The models of iKim et al.l (|2011h , which reproduce the clustering characteristics of the 
HIPASS sample, can now be exploited to attempt to understand the clustering revealed 
by ALFALFA galaxies. Conversely, we may find that these models are not able to reli- 
ably reproduce the more complex characteristics of a AO, particularly the dependence on 
galaxy characteristics (e.g., HI mass, color), a. 40 can therefore contribute to the improve- 
ment of these models, working to close the gap between the extremely detailed optical 
characteristics of simulated galaxies and the poor understanding of where cold gas fits 
into the picture. To date, such models could only be loosely compared to Hl-selected 
samples, given the lack of a large survey like ALFALFA and robust measurements of cos- 
mological simulations for these samples. Instead, these models typically focus on fitting 
the luminosities and stellar characteristics of observed galaxies, which are related to gas 
reservoirs (since gas fuels star formation), but only indirectly. 

If simulations can be adjusted now that robus t benchmarks exist f or the z = char 



acteristics of Hl-selected galaxies (e.g., this work, iMartin et al.l (|2010( ), iPapastergis et al. 



(|2011l) , and others), this could constrain the allowed evolutionary tracks that the distribu- 
tion of gas reservoirs may have followed. Further, the clustering of Hl-selected galaxies, 
in particular high-mass galaxies, can be applied to make predictions of the strength of the 
signal that will be obtained with future intensity mapping projects, which will not resolve 
individual galaxies but will measure the bulk HI on ~10 Mpc scales. The a AO measure- 
ment of the Hl-selected galaxy bias indicates that, at low redshift, the selection of high-HI 
mass galaxies over large scales ensures a sample that adequately probes the underlying 
dark matter distribution. 

These findings, and the potential for more robust understanding of the role of gas in 
galaxy evolution, motivate further work in this area. We will further explore clustering 
properties of gas-rich galaxies as the focus of future work; Papastergis et al. 2011 (in 
preparation) are analyzing the dependence of ^(r) on such properties as galaxy color, 
gas fraction, luminosity, and HI mass. ALFALFA has a distinct advantage in exploring 
this dependence, given its high sensitivity across 5 orders of magnitude in HI mass, its 
blind ability to detect both low surface brightness and extremely large, bright spirals, its 
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coverage of a cosmologically representative volume, and its overall sample size. 
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